An Improved Arabic WordS roots Extraction method using n-Gram Technique
نویسندگان
چکیده
Arabic language is distinguished by its morphological richness, which forces the workers in the field of Arabic language Processing (i.e., information retrieval, document’s classification, text summarizing) to deal with many words that seem to be different but in reality they came from an identical root word. One of the methods to overcome this problem is to return the words to their roots. This research aims to provide a new algorithm, that returns roots of Arabic words using n-gram technique without using morphological rules in order to avoid the complexity arising from the morphological richness of the language in one hand and the multiplicity of morphological rules in other hand. The proposed algorithm uses a list that contains over 4,500 identical roots words.
منابع مشابه
Classical Arabic Poetry Categorization Using N-gram Frequency Statistics
Most of the Arabic language vocabulary is built from the roots derivation. These roots are words composed of three to five consonants letters. Any performance in Arabic language for the purpose of information retrieval needs to deal with the language morphological and structural changes first (which is called the stemming process) then a statistical method for extracting information is implemen...
متن کاملA Study of Association Measures and their Combination for Arabic MWT Extraction
Automatic Multi-Word Term (MWT) extraction is a very important issue to many applications, such as information retrieval, question answering, and text categorization. Although many methods have been used for MWT extraction in English and other European languages, few studies have been applied to Arabic. In this paper, we propose a novel, hybrid method which combines linguistic and statistical a...
متن کاملTowards a new Approach for Arabic root extraction: Exploit relations between the word letters and their placement in the word for Arabic root extraction
This paper presents a new root-extraction approach for Arabic words. The approach tries to assign for Arabic words a unique root without relying on a database of word roots, a list of word patterns or a list of all the prefixes and the suffixes of the Arabic words. Unlike most of Arabic rule-based stemmers, it tries to predict the root-letters positions one by one based on some rules and relati...
متن کاملA Bio-Inspired Approach for Multi-Word Expression Extraction
This paper proposes a new approach for Multi-word Expression (MWE)extraction on the motivation of gene sequence alignment because textual sequence is similar to gene sequence in pattern analysis. Theory of Longest Common Subsequence (LCS) originates from computer science and has been established as affine gap model in Bioinformatics. We perform this developed LCS technique combined with linguis...
متن کاملNahla A Belal An Efficient Rank Based Arabic Root Extractor
Nahla A Belal An Efficient Rank Based Arabic Root Extractor A morphologically-rich language such as Arabic requires deep analysis this is due to its invaluable characteristics which are beneficial for the task of root extraction. This paper investigates employing new techniques to enumerate and rank possible roots for a given word, using linguistic rules as scoring mechanisms. The proposed tech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCS
دوره 10 شماره
صفحات -
تاریخ انتشار 2014